Zurich by the Numbers - Predictive Insights into Tourism Dynamic
Authors
Affiliation
Name I, First Name I
University of Lausanne
Name II, First Name II
Published
May 15, 2024
Abstract
The following Forecasting project focuses on applying forecasting techniques to predict tourism trends in Zurich. This analysis aims to harness the power of historical data combined with forecasting algorithms to provide actionable insights into future tourism patterns. We engage in comprehensive data preparation, explore various predictive models, and conduct a detailed evaluation of their forecasting accuracy. The project encapsulates the challenge of turning complex data into understandable and strategic information, crucial for effective decision-making in Zurich’s tourism sector.
1 Exploration & Visualization
1.1 Objectives
The main objectives of this project is to predict :
The overnight stays of the visitors in Vaud, from October 2023 until December 2024.
The overnight satys of visitors from Philippines to Zürich, from October 2023 until December 2024.
2 DATA
2.1 Cleaning & Wrangling
2.1.1 Tourism Data - General Overview
The dataset contains information about the overnights stays by tourists in the various Swiss cantons. It indicates the tourist’s country of origin, the canton of stay, the month, the year and the total number of overnights stays.
As the dataset provided is in German, we have translated the data in English to make it more intuitive and understandable for everyone. Then, we created a new ‘Date’ column, year-month-day, which corresponds to the correct format to be able to make predictions.
Click to show code
# Load the data in folder data named Dataset_tourism.xlsx)tourism_data <- readxl::read_xlsx(here("data/Dataset_tourism.xlsx"))#removing value 'Herkunftsland - Total' in column 'Herkunftsland' as it is just the total#tourism_data <- tourism_data %>% filter(Herkunftsland != "Herkunftsland - Total")#print unique values in month columnunique(tourism_data$Monat)#> [1] "Januar" "Februar" "März" "April" "Mai" #> [6] "Juni" "Juli" "August" "September" "Oktober" #> [11] "November" "Dezember"# change ' [1] "Januar" "Februar" "März" "April" "Mai" "Juni" "Juli" "August" "September" "Oktober" "November" "Dezember" into english month'tourism_data$Monat <- tourism_data$Monat %>%recode_factor("Januar"="January","Februar"="February","März"="March","April"="April","Mai"="May","Juni"="June","Juli"="July","August"="August","September"="September","Oktober"="October","November"="November","Dezember"="December")#add date type column for plotting purposestourism_data <- tourism_data %>%mutate(Date =dmy(paste("01", Monat, Jahr)))# filtering out 'Herkunftsland - Total' in column 'Herkunftsland' as it is just the totaltourism_data_no_total <- tourism_data %>%filter(Herkunftsland !="Herkunftsland - Total")#check for NANsum(is.na(tourism_data_no_total))#> [1] 51395#analyse the NAN values, where are they(tourism_data_no_total %>%filter(is.na(value)))#> # A tibble: 51,395 x 6#> Herkunftsland Kanton Monat Jahr value Date #> <chr> <chr> <fct> <chr> <dbl> <date> #> 1 Malta Schwe~ Janu~ 2005 NA 2005-01-01#> 2 Zypern Schwe~ Janu~ 2005 NA 2005-01-01#> 3 Mexiko Schwe~ Janu~ 2005 NA 2005-01-01#> 4 Übriges Zentralamerika, Karib~ Schwe~ Janu~ 2005 NA 2005-01-01#> 5 Bahrain Schwe~ Janu~ 2005 NA 2005-01-01#> 6 Katar Schwe~ Janu~ 2005 NA 2005-01-01#> 7 Kuwait Schwe~ Janu~ 2005 NA 2005-01-01#> 8 Australien Schwe~ Janu~ 2005 NA 2005-01-01#> 9 Neuseeland, Ozeanien Schwe~ Janu~ 2005 NA 2005-01-01#> 10 Oman Schwe~ Janu~ 2005 NA 2005-01-01#> # i 51,385 more rows#show data using reactable only showing the first 100 rowsreactable::reactable(head(tourism_data_no_total, 1000), searchable =TRUE)
2.1.2 Tourism Data - Vaud
Given the two objectives of the project, we are going to filter the initial dataset in order to keep and analyse only the cantons of interest. We start by filtering the “Kanton” column to keep only the canton of Vaud.
Click to show code
# Filter by canton Vaud tourism_vaud <- tourism_data %>%filter(Kanton =="Vaud")#check for NANsum(is.na(tourism_vaud))#> [1] 1869#show the data in a table using reactablereactable::reactable(head(tourism_vaud, 20))
2.1.3 Tourism Data - Zurich
We filtered the “Kanton” column to keep only the canton of Zurich.
Click to show code
#filter column 'Kanton' for Zurichtourism_data_zurich <- tourism_data_no_total %>%filter(Kanton =="Zürich")#check for NANsum(is.na(tourism_data_zurich))#> [1] 1869#analyse the NAN values, where are theytourism_data_zurich %>%filter(is.na(value))#> # A tibble: 1,869 x 6#> Herkunftsland Kanton Monat Jahr value Date #> <chr> <chr> <fct> <chr> <dbl> <date> #> 1 Malta Zürich Janu~ 2005 NA 2005-01-01#> 2 Zypern Zürich Janu~ 2005 NA 2005-01-01#> 3 Mexiko Zürich Janu~ 2005 NA 2005-01-01#> 4 Übriges Zentralamerika, Karib~ Zürich Janu~ 2005 NA 2005-01-01#> 5 Bahrain Zürich Janu~ 2005 NA 2005-01-01#> 6 Katar Zürich Janu~ 2005 NA 2005-01-01#> 7 Kuwait Zürich Janu~ 2005 NA 2005-01-01#> 8 Australien Zürich Janu~ 2005 NA 2005-01-01#> 9 Neuseeland, Ozeanien Zürich Janu~ 2005 NA 2005-01-01#> 10 Oman Zürich Janu~ 2005 NA 2005-01-01#> # i 1,859 more rows#show the data in a table using reactablereactable(head(tourism_data_zurich, 1000))
There are 1869 missing values for the two sub-datasets. These missing values come from the ‘value’ column, creating gaps in the time series. We’ll see later how we’re going to process them to do modelling.
2.1.4 Tourism Data - Zurich and Philipines
We are filtering the “Kanton” column and the ‘Herkunftsland’ column, keeping Zurich and Philippinen for the country of origin.
Click to show code
tourism_data_zurich_philippines <- tourism_data_zurich %>%filter(Herkunftsland =="Philippinen")#show table using reactablereactable::reactable(tourism_data_zurich_philippines)
Filtering for ‘Philippinen’ solved the problem of missing data we had with all countries of origin. The overnight stays are all included throughout the period.
However there had been missing values, we would have used one of the two ways of dealing with the problem. Firstly, we can simply take the section of data after the last missing value, assuming that there is a long enough series of observations to produce meaningful predictions. Secondly, we can replace the missing values with estimates. To do this, we first fit an ARIMA model to the data containing missing values, and then use the model to interpolate the missing observations.
Click to show code
# #Creating a tsibble with missing values# data <- tourism_data_zurich_philippines %>%# as_tsibble(key = c(Kanton, Herkunftsland, Monat, Jahr)) %>%# select(Date, value) %>%# fill_gaps()# # # Fit an ARIMA model to data with missing values# model_fit <- data %>%# model(ARIMA(value))# # # Interpolate missing values using the fitted ARIMA model# filled_data <- model_fit %>%# interpolate(data)# # # Print the data with filled in missing values# print(filled_data)
3 EDA - Vaud
3.1 Visitors from different countries in Vaud
The graph shows the monthly number of overnight stays in Vaud from tourists of different countries. The period runs from January 2005 to September 2023.
Click to show code
# Create the ggplot objectplot_vaud <- tourism_vaud %>%filter(Herkunftsland !='Herkunftsland - Total') %>%ggplot(aes(x = Date, y = value, group = Herkunftsland, color = Herkunftsland,text =paste("Country:", Herkunftsland, "Trips:", value))) +# Added text for tooltipgeom_line(show.legend =FALSE) +scale_color_viridis_d() +# Use viridis color palettelabs(title ="Number of visitors from Each Country to Vaud",x ="Date",y ="Number of Trips") +theme_minimal()# Convert to an interactive plotly object with specified width and heightinteractive_plot <-ggplotly(plot_vaud, tooltip ="text", width =600, height =400)# Adjust plotly settings interactive_plot <- interactive_plot %>%layout(showlegend =FALSE# Remove legend )# Display the interactive plotinteractive_plot
The time plot reveals some interesting features. - According to the graph, tourists come mainly from Switzerland. The second visitor country is France. - There are large dips in the number of overnight stays at the beginning of each year – these are due to holiday effects. - There was a important drop during the period in 2020 – this was due to the COVID pandemic. - For swiss tourists, there is visible increasing trend before and after the pandemic.
This time plot takes the total number of tourists in the canton of Vaud, combining all countries of origin. Here, we can better observe the seasonal pattern in the data. The number of tourists decreases at the end and beginning of each year and increases in the middle of the year during the summer holidays. There is also an increasing trend pattern if we do not take into account the period of the pandemic in 2020 which caused an important drop in travel and therefore tourism in Vaud. We’ll come back to this outlier later. Any forecasts of this serie would need to capture the seasonal pattern, and the fact that the trend is changing over the period.
Graphical view of total number of tourists in canton Vaud :
Click to show code
tourism_vaud_total <- tourism_vaud %>%filter(Herkunftsland =='Herkunftsland - Total') %>%select(-c(Herkunftsland, Kanton, Monat, Jahr))# Create the ggplot object with viridis color paletteplot_vaud_total <- tourism_vaud_total %>%ggplot(aes(x = Date, y = value)) +geom_line(color =viridis(1)) +# Use viridis color palette for a single linelabs(x ="Date", y ="Number of tourists", title ="Total number of tourists in canton Vaud") +theme_minimal()# Convert to an interactive plotly object with specified width and heightinteractive_plot_total <-ggplotly(plot_vaud_total, width =600, height =400)# Adjust plotly settingsinteractive_plot_total <- interactive_plot_total %>%layout(showlegend =FALSE# Remove legend if any )# Display the interactive plotinteractive_plot_total
3.1.1 Decomposition
We have process an additive decomposition of the time series into three components: trend, seasonality and residual. These components will allow us to understand how they contribute to the variations observed in Swiss tourism data.
Click to show code
# Convert data to a time series objectvaud_ts <- tourism_vaud_total %>%arrange(Date) %>%# Filtre pour enlever les valeurs NA dans 'Date'filter(!is.na(Date)) %>%# Ensure data is complete and monthlycomplete(Date =seq.Date(min(Date, na.rm =TRUE), max(Date, na.rm =TRUE), by ="month")) %>%replace_na(list(value =0)) %>%# Replace NA values if there are any# Create a time series objectwith(ts(value, frequency =12, start =decimal_date(min(Date, na.rm =TRUE))))# Decompose the time seriesvaud_ts %>%decompose() %>%plot()
The main insights from this decomposition reflect what we have already observed.
- A clear upward trend until around 2020, when it peaks before falling sharply as a result of the pandemic and travel restrictions.
- Monthly seasonality, with clear and regular fluctuations due to seasonal factors.
- A stable residual component until 2020. After this period, there is a slight increase in volatility which may indicate that other events are having an impact on this time series which are not captured by the first two components.
4 EDA - Zurich
4.1 Zurich and All visiting countries
The graph shows the monthly number of overnight stays in Zurich from tourists of different countries.
Click to show code
# Preparing the data#removing value in column 'Herkunftsland' as it is just the whole of Switzerlanddata <- tourism_data_zurich %>%filter(!is.na(value)) %>%# Removing rows with NA values in the 'value' columnmutate(Monat =month(Date, label =TRUE, abbr =TRUE), # Extract month Jahr =year(Date)) %>%# Extract year from Dategroup_by(Herkunftsland, Date) %>%# Group by country and datesummarise(Trips =sum(value), .groups ='drop') # Summing up trips for each country per datep <-ggplot(data, aes(x = Date, y = Trips, group = Herkunftsland,color = Herkunftsland =="Philippinen",text =paste("Country:", Herkunftsland, "<br>Trips:", Trips))) +# Added text for tooltipgeom_line(show.legend =FALSE) +scale_color_viridis_d() +# Use viridis color palettelabs(title ="Number of Trips from Each Country to Zurich",x ="Date",y ="Number of Trips") +theme_minimal()# Convert to an interactive plotly objectinteractive_plot <-ggplotly(p, tooltip ="text", width =600, height =600)# Adjust plotly settings interactive_plot <- interactive_plot %>%layout(margin =list(l =60, r =60, b =60, t =80), # Adjust marginsshowlegend =FALSE# Show legend )# Display the interactive plotinteractive_plot
As for Vaud, the most frequent visitors to Zurich are Swiss. Germany and United States are the two main foreign countries to visit Zurich. This can be explained by the fact that the canton of Zurich is closer to Germany and therefore easier to reach. The same applies to France with the canton of Vaud. The yellow curve represents the Philippines. The curve is flat and shows a considerably small number of trips from this country over the period. There is a drastic fall in 2020 caused by COVID-19. The pandemic has had a significant impact on the tourism industry worldwide. At first glance, there are regular seasonal peaks for most countries which suggests also the presence of seasonality in tourism in the canton of Zurich.
4.2 Zurich and Philippinens Visitors
This graph shows only visitors from the Philippines, as this is the country of interest in our analysis.
Click to show code
# Use tourism_data_zurich_philippines data to plot the values in y axis and Date in x axisp <-ggplot(tourism_data_zurich_philippines, aes(x = Date, y = value)) +geom_line(color =viridis(1)) +# Use viridis color palette for a single linelabs(title ="Number of Trips from Philippines to Zurich",x ="Date",y ="Number of Trips") +theme_minimal()# Convert to an interactive plotly object with specified width and heightinteractive_plot <-ggplotly(p, width =600, height =400)# Adjust plotly settingsinteractive_plot <- interactive_plot %>%layout(showlegend =FALSE# Remove legend if any )# Display the interactive plotinteractive_plot
4.2.1 Pattern
4.2.1.1 Decomposition
The additive time series decomposition of the monthly overnight stays for tourists coming from the Philippines to the canton of Zurich shows:
An upward trend until around 2020, when it falls sharply because of the pandemic and travel restrictions. The pandemic had a longer effect on the Philippine tourism, which stopped for a longer period (around 2 years or more).
Multiple peaks in the seasonal monthly component. These fluctuations are due to their calendar. Philippines start their summer holidays earlier than we do (31 of May - 29 of July) and have longer La Toussaint holidays (5 October - 18 October - 28 October).
A residual component with moderate variability which increases from 2020 onwards, indicating the influence of unforeseen or exceptional events (such as the pandemic) that have disrupted the usual models.
4.2.1.2 Seasonality
Seasonal sub-series plot permit to better visualize the monthly fluctuations of each year, from 2005 to 2023.
Click to show code
# Plot the seasonality in one chartggseasonplot(tourism_ts, year.labels =TRUE, year.labels.left =TRUE) +scale_color_viridis_d() +theme_minimal()
Click to show code
# several chart per month to see the seasonalityggsubseriesplot(tourism_ts) +ylab("Number of tourists") +xlab("Month") +ggtitle("Seasonal subseries plot")#debug#better to use gg_subseries to see the seasonality#tourism_ts %>% gg_subseries(value) + ylab("Number of tourists") + xlab("Month") + ggtitle("Seasonal subseries plot")
The months of May to July and October seem to have visitor peaks, which may indicate a high tourist season during this period. As we saw before, this is due to their calendar. The years 2022 and 2023 show a significant increase in visitor numbers compared with previous years. In particular, the months from May to October 2022 and 2023 show much higher values. This growth may be due to a number of factors, such as a post-pandemic recovery in travel or specific initiatives that have attracted more tourists.
4.2.1.3 Trend
There is an upward trend until around 2020, when it falls sharply because of the pandemic and travel restrictions. The pandemic had a longer effect on the Philippine tourism, which stopped for a longer period (around 2 years or more).
5 Modelling
This part is about building on your knowledge of time series techniques to model your data. You can investigate various models but you should justify in your report your choices regarding these. Pay attention to the conditions that are needed to apply a specific model. Treat also carefully seasonality, outliers, colinearity, covariates, special events, etc. Remember the following steps:
Aggregation choice for hierarchical time series
Model building
Model selection
5.1 Total number of visitors in Vaud
5.1.1 Outliers, Correlation, Colinearity, Covariates, Special Events ?
Questions ?
5.1.2 ETS model
This ETS model generates forecasts for the next two years (h = 24) and takes into account the Trend, Errors and Seasonality present in the time series. These three components are additive, which is why it is an AAA model.
Click to show code
ets_vaud <-ets(vaud_ts, model ="AAA")forecast_ets_vaud <-forecast(ets_vaud, h =24) %>%plot(main ="Forecast of visitors in Vaud", xlab ="Date", ylab ="Number of visitors")
5.1.3 ARIMA
The ARIMA model has been adjusted to take account of the seasonality of the data, which is crucial for time series with pronounced seasonal variations, as is the case here.
Click to show code
arima_vaud <-auto.arima(vaud_ts, seasonal =TRUE)# Generate forecasts for the next 2 years (24 months)forecast_arima_vaud <-forecast(arima_vaud, h =24)# Plot the forecastplot(forecast_arima_vaud, main ="ARIMA Forecast for Vaud Tourism", xlab ="Date", ylab ="Number of Tourists")
Click to show code
# Fit ARIMA model with specified parametesarima_model <-arima(vaud_ts, order =c(5, 1, 0), seasonal =list(order =c(0, 1, 1), period =12))forecast_a_vaud <- arima_model %>%forecast(h =24)# # Plot the forecast# forecast_a_vaud %>%# autoplot(data = vaud_tsibble, main = "ARIMA Forecast for Vaud Tourism", ylab = "Number of Tourists")# # #Provide forecast in table# as.data.frame(forecast_a_vaud) %>% kable(caption = "Forecast for Vaud Tourism") %>%# kable_styling(full_width = FALSE)
The graph shows that the seasonality observed in the historical data continues to manifest itself in the future forecasts, with recurring seasonal peaks and troughs. This demonstrates the robustness of the ARIMA model Forecasts for the next two years (24 months) are represented by the blue line. The grey bands around the blue line represent the forecast confidence interval, indicating the range of values within which future values are likely to lie with a certain probability. Concerning the trend, the model predicts that this upward trend will continue.
5.2 Zurich and Philipines
5.2.1 Forecast without dealing with Covid
5.2.1.1 Naive Forecast
The graph shows the historical trend in the number of tourists from the Philippines to Zurich and the forecasts for the next 24 months using the naive model. We took the graph representing the total number of tourists coming from the Philippines to Zurich.
Click to show code
#convert tourism_ts to tsibbletourism_ts <- tourism_ts %>%as_tsibble()# Fit a naive modelfit <- tourism_ts %>%model(NAIVE =NAIVE(value))# Forecast the next 2 years periodsforecast <- fit %>%forecast(h =15)# Plot the forecasts along with the historical data, and make the colors of the forecast a bit more transparent for distinguishably purposesplot <- forecast %>%autoplot(tourism_ts, alpha =0.5) +labs(title ="Forecast of tourists from Philipines to Zurich",x ="Date",y ="Number of tourists") +guides(colour =guide_legend(title ="Forecast"))plot#get AIC metricmetrics_naive <- fit %>%accuracy()# Display accuracy metrics in an HTML tablemetrics_naive %>%kable("html", caption ="Model Accuracy Metrics") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed", "responsive"))
Model Accuracy Metrics
.model
.type
ME
RMSE
MAE
MPE
MAPE
MASE
RMSSE
ACF1
NAIVE
Training
3.44
135
79.2
-18.8
48.3
0.698
0.629
-0.3
This naive model predicts that future values will be equal to the last observed value in the time series. It does not take into account the past events like the pandemic and assumes here that the levels observed after this extreme fall will remain unchanged. The model does not take into account trends or seasonality neither, which are very present in our case. It’s a simplified approach.
The blue areas represent the 80% (darker) and 95% (lighter) confidence intervals of the forecasts. The wider the interval, the greater the uncertainty about the long-term forecasts which is the case here.
5.2.1.2 Exponential Smoothing
Why additive errors ? Because the variance of the errors is constant over time no ?
Click to show code
# Fit an ETS model# Adjusting the model parameters according to the characteristics of the data# Here "A" means additive error, "N" means no trend, and "N" means no seasonality# change these if neededfit <- tourism_ts %>%model(ETS_M_seaso =ETS(value ~error("A") +trend("A") +season("A"))) #multiplicative seasonality)# Forecast the next 2 years periodsforecast <- fit %>%forecast(h =15)# Plot the forecasts along with the historical data, and make the colors of the forecast a bit more transparent for distinguishably purposesplot <- forecast %>%autoplot(tourism_ts, alpha =0.5) +labs(title ="Forecast of tourists from Philipines to Zurich",x ="Date",y ="Number of tourists") +guides(colour =guide_legend(title ="Forecast"))plot# Calculate the accuracy of the training setmetrics_ets_AAA <- fit %>%accuracy()# Display accuracy metrics in an HTML tablemetrics_naive %>%kable("html", caption ="Model Accuracy Metrics") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed", "responsive"))
Model Accuracy Metrics
.model
.type
ME
RMSE
MAE
MPE
MAPE
MASE
RMSSE
ACF1
NAIVE
Training
3.44
135
79.2
-18.8
48.3
0.698
0.629
-0.3
Clearly see here that the confidence interval is too big, almost like a naive forecast
Why trend dp and seaso M ? - Trend is present so A - Seasonality is present and growing over time so Multiplicative was chosen
Click to show code
# comparing several modelfit <- tourism_ts %>%model(ETS_M_seaso =ETS(value ~error("A") +trend("A") +season("M")), #multiplicative seasonalityETS_M_seaso_Ad =ETS(value ~error("A") +trend("Ad") +season("M")), #dampted trend )# Forecast the next 2 years periodsforecast <- fit %>%forecast(h =24)# Plot the forecasts along with the historical data, and make the colors of the forecast a bit more transparent for distinguishably purposesplot <- forecast %>%autoplot(tourism_ts, level =90, color ="blue", alpha =0.5) +labs(title ="Forecast of tourists from Philipines to Zurich",x ="Date",y ="Number of tourists") +guides(colour =guide_legend(title ="Forecast"))plot# Calculate the accuracy of the training setmetrics_ets <- fit %>%accuracy()# Display accuracy metrics in an HTML tablemetrics_ets %>%kable("html", caption ="Model Accuracy Metrics") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed", "responsive"))
Model Accuracy Metrics
.model
.type
ME
RMSE
MAE
MPE
MAPE
MASE
RMSSE
ACF1
ETS_M_seaso
Training
2.93
85.7
55.3
-81.5
102
0.487
0.399
0.040
ETS_M_seaso_Ad
Training
2.06
74.7
48.9
-78.7
105
0.431
0.348
0.181
5.2.1.2.1 Thus Chosen Model :
Click to show code
fit <- tourism_ts %>%model(ETS_M_seaso =ETS(value ~error("A") +trend("Ad") +season("M"))) #multiplicative seasonality)# Forecast the next 2 years periodsforecast <- fit %>%forecast(h =24)# Plot the forecasts along with the historical data, and make the colors of the forecast a bit more transparent for distinguishably purposesplot <- forecast %>%autoplot(tourism_ts, alpha =0.5) +labs(title ="Forecast of tourists from Philipines to Zurich",x ="Date",y ="Number of tourists") +guides(colour =guide_legend(title ="Forecast"))plot
5.2.1.3 ARIMA
Question : Do we need to differentiate the data ?
Click to show code
# Fit an automatic ARIMA modelfit_arima <- tourism_ts %>%model(ARIMA_auto =ARIMA(value))# Forecast the next 2 years (24 months)forecast_arima <- fit_arima %>%forecast(h =24)# Plot the forecasts along with the historical dataplot_arima <- forecast_arima %>%autoplot(tourism_ts, alpha =0.5) +labs(title ="ARIMA Forecast of Tourists from the Philippines to Zurich",x ="Date",y ="Number of Tourists") +guides(colour =guide_legend(title ="Forecast"))plot_arima# Calculate the accuracy of the training setmetrics_arima <- fit_arima %>%accuracy()# Display accuracy metrics in an HTML tablemetrics_arima %>%kable("html", caption ="Model Accuracy Metrics") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed", "responsive"))
Model Accuracy Metrics
.model
.type
ME
RMSE
MAE
MPE
MAPE
MASE
RMSSE
ACF1
ARIMA_auto
Training
6.74
113
67.7
-68.7
123
0.597
0.526
-0.024
Ugly forecast, confidence interval is too big
Click to show code
# using auto.arima with stepwise and approximation options turned off for a more thorough searchfit_updated <-auto.arima(tourism_ts, seasonal =TRUE, stepwise =FALSE, approximation =FALSE)summary(fit_updated)#> Series: tourism_ts #> ARIMA(4,1,0)(1,0,0)[12] #> #> Coefficients:#> ar1 ar2 ar3 ar4 sar1#> -0.407 -0.161 -0.236 -0.270 0.462#> s.e. 0.064 0.071 0.073 0.066 0.074#> #> sigma^2 = 12436: log likelihood = -1373#> AIC=2758 AICc=2758 BIC=2778#> #> Training set error measures:#> ME RMSE MAE MPE MAPE MASE ACF1#> Training set 5.55 110 66.9 -77.7 147 0.589 -0.041